The process by which companies create value for customers and build strong customer relationships in order to capture value from customers in return.
Kotler and Armstrong (2010).
Marketing campaigns are characterized by focusing on the customer needs and their overall satisfaction. Nevertheless, there are different variables that determine whether a marketing campaign will be successful or not. There are certain variables that we need to take into consideration when making a marketing campaign.
1) Segment of the Population: To which segment of the population is the marketing campaign going to address and why? This aspect of the marketing campaign is extremely important since it will tell to which part of the population should most likely receive the message of the marketing campaign.
2) Distribution channel to reach the customer's place: Implementing the most effective strategy in order to get the most out of this marketing campaign. What segment of the population should we address? Which instrument should we use to get our message out? (Ex: Telephones, Radio, TV, Social Media Etc.)
3) Price: What is the best price to offer to potential clients? (In the case of the bank's marketing campaign this is not necessary since the main interest for the bank is for potential clients to open depost accounts in order to make the operative activities of the bank to keep on running.)
4) Promotional Strategy: This is the way the strategy is going to be implemented and how are potential clients going to be address. This should be the last part of the marketing campaign analysis since there has to be an indepth analysis of previous campaigns (If possible) in order to learn from previous mistakes and to determine how to make the marketing campaign much more effective.
I know this is a well known dataset since it comes from UCI Machine Learning Repository. However, I believe there are some interesting insights you could see that you could integrate to your own data analysis. All in all, Kaggle is meant to learn from others and I hope this example suits you well.
Please feel free to use this kernel to your projects it will be my pleasure!
Also, I'm open to new ideas and things that I could improve to make this kernel even better! Open to constructie criticisms!
Lastly, I will like to give a special thanks to Randy Lao and his well-known Predicting Employee Kernelover. His kernel gave me different ideas as to how should I approach an analysis of a dataset.
Also, I want to give credit to this stackoverflow post, which helped me change the name of legends from Facetgrids.
https://stackoverflow.com/questions/45201514/edit-seaborn-plot-figure-legend
Check it out if you are struggling with the same problem.
A Term deposit is a deposit that a bank or a financial institurion offers with a fixed rate (often better than just opening deposit account) in which your money will be returned back at a specific maturity time. For more information with regards to Term Deposits please click on this link from Investopedia: https://www.investopedia.com/terms/t/termdeposit.asp
A. Attribute Descriptions
I. Bank client data
II. Related with the last contact of the current campaign
III. Other attributes
B. Structuring the data:
I. Overall Analysis of the Data
II. Data Structuring and Conversions
C. Exploratory Data Analysis (EDA)
I. Accepted vs Rejected Term Deposits
II. Distribution Plots
D. Different Aspects of the Analysis:
I. Months of Marketing Activty
II. Seasonalities
III. Number of Calls to the potential client
IV. Age of the Potential Clients
V. Types of Occupations that leads to more term deposits suscriptions
E. Correlations that impacted the decision of Potential Clients.
I. Analysis of our Correlation Matrix
II. Balance Categories vs Housing Loans
III. Negative Relationship between H.Loans and Term Deposits
F. Classification Model
I. Introduction
II. Stratified Sampling
III. Classification Models
IV. Confusion Matrix
V. Precision and Recall Curve
VI. Feature Importances Decision Tree C.
G. Next Campaign Strategy
I. Actions the Bank should Consider
Input variables:
1 - age: (numeric)
2 - job: type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
3 - marital: marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - education: (categorical: primary, secondary, tertiary and unknown)
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
8 - balance: Balance of the individual.
8 - contact: contact communication type (categorical: 'cellular','telephone')
9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
10 - day: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client (numeric)
15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
Output variable (desired target):
21 - y - has the client subscribed a term deposit? (binary: 'yes','no')
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plotly import tools
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)
MAIN_PATH = '../input/'
df = pd.read_csv(MAIN_PATH +'bank.csv')
original_df = df.copy()
# Have a grasp of how our data looks.
df.head()
Note: There are not that much insights we can gain from the descriptive dataset since most of our descriptive data is located not in the "numeric" columns but in the "categorical columns".
df.describe()
Fortunately, there are no missing values. If there were missing values we will have to fill them with the median, mean or mode. I tend to use the median but in this scenario there is no need to fill any missing values. This will definitely make our job easier!
# No missing values.
df.info()
f, ax = plt.subplots(1,2, figsize=(16,8))
colors = ["#FA5858", "#64FE2E"]
labels ="Did not Open Term Suscriptions", "Opened Term Suscriptions"
plt.suptitle('Information on Term Suscriptions', fontsize=20)
df["deposit"].value_counts().plot.pie(explode=[0,0.25], autopct='%1.2f%%', ax=ax[0], shadow=True, colors=colors,
labels=labels, fontsize=12, startangle=25)
# ax[0].set_title('State of Loan', fontsize=16)
ax[0].set_ylabel('% of Condition of Loans', fontsize=14)
# sns.countplot('loan_condition', data=df, ax=ax[1], palette=colors)
# ax[1].set_title('Condition of Loans', fontsize=20)
# ax[1].set_xticklabels(['Good', 'Bad'], rotation='horizontal')
palette = ["#64FE2E", "#FA5858"]
sns.barplot(x="education", y="balance", hue="deposit", data=df, palette=palette, estimator=lambda x: len(x) / len(df) * 100)
ax[1].set(ylabel="(%)")
ax[1].set_xticklabels(df["education"].unique(), rotation=0, rotation_mode="anchor")
plt.show()
# Let's see how the numeric data is distributed.
import matplotlib.pyplot as plt
plt.style.use('seaborn-whitegrid')
df.hist(bins=20, figsize=(14,10), color='#E14906')
plt.show()
df['deposit'].value_counts()
# plt.style.use('dark_background')
fig = plt.figure(figsize=(20,20))
ax1 = fig.add_subplot(221)
ax2 = fig.add_subplot(222)
ax3 = fig.add_subplot(212)
g = sns.boxplot(x="default", y="balance", hue="deposit",
data=df, palette="muted", ax=ax1)
g.set_title("Amount of Balance by Term Suscriptions")
# ax.set_xticklabels(df["default"].unique(), rotation=45, rotation_mode="anchor")
g1 = sns.boxplot(x="job", y="balance", hue="deposit",
data=df, palette="RdBu", ax=ax2)
g1.set_xticklabels(df["job"].unique(), rotation=90, rotation_mode="anchor")
g1.set_title("Type of Work by Term Suscriptions")
g2 = sns.violinplot(data=df, x="education", y="balance", hue="deposit", palette="RdBu_r")
g2.set_title("Distribution of Balance by Education")
plt.show()
df.head()
# Drop the Job Occupations that are "Unknown"
df = df.drop(df.loc[df["job"] == "unknown"].index)
# Admin and management are basically the same let's put it under the same categorical value
lst = [df]
for col in lst:
col.loc[col["job"] == "admin.", "job"] = "management"
df.columns
import squarify
df = df.drop(df.loc[df["balance"] == 0].index)
x = 0
y = 0
width = 100
height = 100
job_names = df['job'].value_counts().index
values = df['job'].value_counts().tolist()
normed = squarify.normalize_sizes(values, width, height)
rects = squarify.squarify(normed, x, y, width, height)
colors = ['rgb(200, 255, 144)','rgb(135, 206, 235)',
'rgb(235, 164, 135)','rgb(220, 208, 255)',
'rgb(253, 253, 150)','rgb(255, 127, 80)',
'rgb(218, 156, 133)', 'rgb(245, 92, 76)',
'rgb(252,64,68)', 'rgb(154,123,91)']
shapes = []
annotations = []
counter = 0
for r in rects:
shapes.append(
dict(
type = 'rect',
x0 = r['x'],
y0 = r['y'],
x1 = r['x'] + r['dx'],
y1 = r['y'] + r['dy'],
line = dict(width=2),
fillcolor = colors[counter]
)
)
annotations.append(
dict(
x = r['x']+(r['dx']/2),
y = r['y']+(r['dy']/2),
text = values[counter],
showarrow = False
)
)
counter = counter + 1
if counter >= len(colors):
counter = 0
# For hover text
trace0 = go.Scatter(
x = [ r['x']+(r['dx']/2) for r in rects],
y = [ r['y']+(r['dy']/2) for r in rects],
text = [ str(v) for v in job_names],
mode='text',
)
layout = dict(
title='Number of Occupations <br> <i>(From our Sample Population)</i>',
height=700,
width=700,
xaxis=dict(showgrid=False,zeroline=False),
yaxis=dict(showgrid=False,zeroline=False),
shapes=shapes,
annotations=annotations,
hovermode='closest'
)
# With hovertext
figure = dict(data=[trace0], layout=layout)
iplot(figure, filename='squarify-treemap')
# Now let's see which occupation tended to have more balance in their accounts
suscribed_df = df.loc[df["deposit"] == "yes"]
occupations = df["job"].unique().tolist()
# Get the balances by jobs
management = suscribed_df["age"].loc[suscribed_df["job"] == "management"].values
technician = suscribed_df["age"].loc[suscribed_df["job"] == "technician"].values
services = suscribed_df["age"].loc[suscribed_df["job"] == "services"].values
retired = suscribed_df["age"].loc[suscribed_df["job"] == "retired"].values
blue_collar = suscribed_df["age"].loc[suscribed_df["job"] == "blue-collar"].values
unemployed = suscribed_df["age"].loc[suscribed_df["job"] == "unemployed"].values
entrepreneur = suscribed_df["age"].loc[suscribed_df["job"] == "entrepreneur"].values
housemaid = suscribed_df["age"].loc[suscribed_df["job"] == "housemaid"].values
self_employed = suscribed_df["age"].loc[suscribed_df["job"] == "self-employed"].values
student = suscribed_df["age"].loc[suscribed_df["job"] == "student"].values
ages = [management, technician, services, retired, blue_collar, unemployed,
entrepreneur, housemaid, self_employed, student]
colors = ['rgba(93, 164, 214, 0.5)', 'rgba(255, 144, 14, 0.5)',
'rgba(44, 160, 101, 0.5)', 'rgba(255, 65, 54, 0.5)',
'rgba(207, 114, 255, 0.5)', 'rgba(127, 96, 0, 0.5)',
'rgba(229, 126, 56, 0.5)', 'rgba(229, 56, 56, 0.5)',
'rgba(174, 229, 56, 0.5)', 'rgba(229, 56, 56, 0.5)']
traces = []
for xd, yd, cls in zip(occupations, ages, colors):
traces.append(go.Box(
y=yd,
name=xd,
boxpoints='all',
jitter=0.5,
whiskerwidth=0.2,
fillcolor=cls,
marker=dict(
size=2,
),
line=dict(width=1),
))
layout = go.Layout(
title='Distribution of Ages by Occupation',
yaxis=dict(
autorange=True,
showgrid=True,
zeroline=True,
dtick=5,
gridcolor='rgb(255, 255, 255)',
gridwidth=1,
zerolinecolor='rgb(255, 255, 255)',
zerolinewidth=2,
),
margin=dict(
l=40,
r=30,
b=80,
t=100,
),
paper_bgcolor='rgb(224,255,246)',
plot_bgcolor='rgb(251,251,251)',
showlegend=False
)
fig = go.Figure(data=traces, layout=layout)
iplot(fig)
# Balance Distribution
# Create a Balance Category
df["balance_status"] = np.nan
lst = [df]
for col in lst:
col.loc[col["balance"] < 0, "balance_status"] = "negative"
col.loc[(col["balance"] >= 0) & (col["balance"] <= 30000), "balance_status"] = "low"
col.loc[(col["balance"] > 30000) & (col["balance"] <= 40000), "balance_status"] = "middle"
col.loc[col["balance"] > 40000, "balance_status"] = "high"
# balance by balance_status
negative = df["balance"].loc[df["balance_status"] == "negative"].values.tolist()
low = df["balance"].loc[df["balance_status"] == "low"].values.tolist()
middle = df["balance"].loc[df["balance_status"] == "middle"].values.tolist()
high = df["balance"].loc[df["balance_status"] == "high"].values.tolist()
# Get the average by occupation in each balance category
job_balance = df.groupby(['job', 'balance_status'])['balance'].mean()
trace1 = go.Barpolar(
r=[-199.0, -392.0, -209.0, -247.0, -233.0, -270.0, -271.0, 0, -276.0, -134.5],
text=["blue-collar", "entrepreneur", "housemaid", "management", "retired", "self-employed",
"services", "student", "technician", "unemployed"],
name='Negative Balance',
marker=dict(
color='rgb(246, 46, 46)'
)
)
trace2 = go.Barpolar(
r=[319.5, 283.0, 212.0, 313.0, 409.0, 274.5, 308.5, 253.0, 316.0, 330.0],
text=["blue-collar", "entrepreneur", "housemaid", "management", "retired", "self-employed",
"services", "student", "technician", "unemployed"],
name='Low Balance',
marker=dict(
color='rgb(246, 97, 46)'
)
)
trace3 = go.Barpolar(
r=[2128.5, 2686.0, 2290.0, 2366.0, 2579.0, 2293.5, 2005.5, 2488.0, 2362.0, 1976.0],
text=["blue-collar", "entrepreneur", "housemaid", "management", "retired", "self-employed",
"services", "student", "technician", "unemployed"],
name='Middle Balance',
marker=dict(
color='rgb(246, 179, 46)'
)
)
trace4 = go.Barpolar(
r=[14247.5, 20138.5, 12278.5, 12956.0, 20723.0, 12159.0, 12223.0, 13107.0, 12063.0, 15107.5],
text=["blue-collar", "entrepreneur", "housemaid", "management", "retired", "self-employed",
"services", "student", "technician", "unemployed"],
name='High Balance',
marker=dict(
color='rgb(46, 246, 78)'
)
)
data = [trace1, trace2, trace3, trace4]
layout = go.Layout(
title='Mean Balance in Account<br> <i> by Job Occupation</i>',
font=dict(
size=12
),
legend=dict(
font=dict(
size=16
)
),
radialaxis=dict(
ticksuffix='%'
),
orientation=270
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='polar-area-chart')
df['marital'].value_counts()
df['marital'].unique()
df['marital'].value_counts().tolist()
vals = df['marital'].value_counts().tolist()
labels = ['married', 'divorced', 'single']
data = [go.Bar(
x=labels,
y=vals,
marker=dict(
color="#FE9A2E")
)]
layout = go.Layout(
title="Count by Marital Status",
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='basic-bar')
# Distribution of Balances by Marital status
single = df['balance'].loc[df['marital'] == 'single'].values
married = df['balance'].loc[df['marital'] == 'married'].values
divorced = df['balance'].loc[df['marital'] == 'divorced'].values
single_dist = go.Histogram(
x=single,
histnorm='density',
name='single',
marker=dict(
color='#6E6E6E'
)
)
married_dist = go.Histogram(
x=married,
histnorm='density',
name='married',
marker=dict(
color='#2E9AFE'
)
)
divorced_dist = go.Histogram(
x=divorced,
histnorm='density',
name='divorced',
marker=dict(
color='#FA5858'
)
)
fig = tools.make_subplots(rows=3, print_grid=False)
fig.append_trace(single_dist, 1, 1)
fig.append_trace(married_dist, 2, 1)
fig.append_trace(divorced_dist, 3, 1)
fig['layout'].update(showlegend=False, title="Price Distributions by Marital Status",
height=1000, width=800)
iplot(fig, filename='custom-sized-subplot-with-subplot-titles')
df.head()
# Notice how divorced have a considerably low amount of balance.
fig = ff.create_facet_grid(
df,
x='duration',
y='balance',
color_name='marital',
show_boxes=False,
marker={'size': 10, 'opacity': 1.0},
colormap={'single': 'rgb(165, 242, 242)', 'married': 'rgb(253, 174, 216)', 'divorced': 'rgba(201, 109, 59, 0.82)'}
)
iplot(fig, filename='facet - custom colormap')
# Hmmm We have missed some important clients with some high balances.
# This shouldn't be happening.
fig = ff.create_facet_grid(
df,
y='balance',
facet_row='marital',
facet_col='deposit',
trace_type='box',
)
iplot(fig, filename='facet - box traces')
df.head()
df = df.drop(df.loc[df["education"] == "unknown"].index)
df['education'].unique()
df['marital/education'] = np.nan
lst = [df]
for col in lst:
col.loc[(col['marital'] == 'single') & (df['education'] == 'primary'), 'marital/education'] = 'single/primary'
col.loc[(col['marital'] == 'married') & (df['education'] == 'primary'), 'marital/education'] = 'married/primary'
col.loc[(col['marital'] == 'divorced') & (df['education'] == 'primary'), 'marital/education'] = 'divorced/primary'
col.loc[(col['marital'] == 'single') & (df['education'] == 'secondary'), 'marital/education'] = 'single/secondary'
col.loc[(col['marital'] == 'married') & (df['education'] == 'secondary'), 'marital/education'] = 'married/secondary'
col.loc[(col['marital'] == 'divorced') & (df['education'] == 'secondary'), 'marital/education'] = 'divorced/secondary'
col.loc[(col['marital'] == 'single') & (df['education'] == 'tertiary'), 'marital/education'] = 'single/tertiary'
col.loc[(col['marital'] == 'married') & (df['education'] == 'tertiary'), 'marital/education'] = 'married/tertiary'
col.loc[(col['marital'] == 'divorced') & (df['education'] == 'tertiary'), 'marital/education'] = 'divorced/tertiary'
df.head()
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="marital/education", hue="marital/education", aspect=12, palette=pal)
g.map(sns.kdeplot, "balance", clip_on=False, shade=True, alpha=1, lw=1.5, bw=.2)
g.map(sns.kdeplot, "balance", clip_on=False, color="w", lw=1, bw=0)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
education_groups = df.groupby(['marital/education'], as_index=False)['balance'].median()
fig = plt.figure(figsize=(12,8))
sns.barplot(x="balance", y="marital/education", data=education_groups,
label="Total", palette="RdBu")
plt.title('Median Balance by Educational/Marital Group', fontsize=16)
# Let's see the group who had loans from the marital/education group
loan_balance = df.groupby(['marital/education', 'loan'], as_index=False)['balance'].median()
no_loan = loan_balance['balance'].loc[loan_balance['loan'] == 'no'].values
has_loan = loan_balance['balance'].loc[loan_balance['loan'] == 'yes'].values
labels = loan_balance['marital/education'].unique().tolist()
trace0 = go.Scatter(
x=no_loan,
y=labels,
mode='markers',
name='No Loan',
marker=dict(
color='rgb(175,238,238)',
line=dict(
color='rgb(0,139,139)',
width=1,
),
symbol='circle',
size=16,
)
)
trace1 = go.Scatter(
x=has_loan,
y=labels,
mode='markers',
name='Has a Previous Loan',
marker=dict(
color='rgb(250,128,114)',
line=dict(
color='rgb(178,34,34)',
width=1,
),
symbol='circle',
size=16,
)
)
data = [trace0, trace1]
layout = go.Layout(
title="The Impact of Loans to Married/Educational Clusters",
xaxis=dict(
showgrid=False,
showline=True,
linecolor='rgb(102, 102, 102)',
titlefont=dict(
color='rgb(204, 204, 204)'
),
tickfont=dict(
color='rgb(102, 102, 102)',
),
showticklabels=False,
dtick=10,
ticks='outside',
tickcolor='rgb(102, 102, 102)',
),
margin=dict(
l=140,
r=40,
b=50,
t=80
),
legend=dict(
font=dict(
size=10,
),
yanchor='middle',
xanchor='right',
),
width=1000,
height=800,
paper_bgcolor='rgb(255,250,250)',
plot_bgcolor='rgb(255,255,255)',
hovermode='closest',
)
fig = go.Figure(data=data, layout=layout)
iplot(fig, filename='lowest-oecd-votes-cast')
df.head()
import seaborn as sns
sns.set(style="ticks")
sns.pairplot(df, hue="marital/education", palette="Set1")
plt.show()
df.head()
fig = plt.figure(figsize=(12,8))
sns.violinplot(x="balance", y="job", hue="deposit", palette="RdBu_r",
data=df);
plt.title("Job Distribution of Balances by Deposit Status", fontsize=16)
plt.show()
The main purpose of a confusion matrix is to see how our model is performing when it comes to classifying potential clients that are likely to suscribe to a term deposit. We will see in the confusion matrix four terms the True Positives, False Positives, True Negatives and False Negatives.
Positive/Negative: Type of Class (label) ["No", "Yes"]
True/False: Correctly or Incorrectly classified by the model.
True Negatives (Top-Left Square): This is the number of correctly classifications of the "No" class or potenial clients that are not willing to suscribe a term deposit.
False Negatives (Top-Right Square): This is the number of incorrectly classifications of the "No" class or potential clients that are not willing to suscribe a term depositt.
False Positives (Bottom-Left Square): This is the number of incorrectly classifications of the "Yes" class or potential clients that are willing to suscribe a term deposit.
True Positives (Bottom-Right Square): This is the number of correctly classifications of the "Yes" class or potenial clients that are willing to suscribe a term deposit.
Recall: Is the total number of "Yes" in the label column of the dataset. So how many "Yes" labels does our model detect.
Precision: Means how sure is the prediction of our model that the actual label is a "Yes".
As the precision gets higher the recall gets lower and vice versa. For instance, if we increase the precision from 30% to 60% the model is picking the predictions that the model believes is 60% sure. If there is an instance where the model believes that is 58% likely to be a potential client that will suscribe to a term deposit then the model will classify it as a "No." However, that instance was actually a "Yes" (potential client did suscribe to a term deposit.) That is why the higher the precision the more likely the model is to miss instances that are actually a "Yes"!
The ROC curve tells us how well our classifier is classifying between term deposit suscriptions (True Positives) and non-term deposit suscriptions. The X-axis is represented by False positive rates (Specificity) and the Y-axis is represented by the True Positive Rate (Sensitivity.) As the line moves the threshold of the classification changes giving us different values. The closer is the line to our top left corner the better is our model separating both classes.
The top three most important features for our classifier are **Duration (how long it took the conversation between the sales representative and the potential client), contact (number of contacts to the potential client within the same marketing campaign), month (the month of the year).
Gradient Boosting classifier is the best model to predict whether or not a potential client will suscribe to a term deposit or not. 84% accuracy!
1) Months of Marketing Activity: We saw that the the month of highest level of marketing activity was the month of May. However, this was the month that potential clients tended to reject term deposits offers (Lowest effective rate: -34.49%). For the next marketing campaign, it will be wise for the bank to focus the marketing campaign during the months of March, September, October and December. (December should be under consideration because it was the month with the lowest marketing activity, there might be a reason why december is the lowest.)
2) Seasonality: Potential clients opted to suscribe term deposits during the seasons of fall and winter. The next marketing campaign should focus its activity throghout these seasons.
3) Campaign Calls: A policy should be implemented that states that no more than 3 calls should be applied to the same potential client in order to save time and effort in getting new potential clients. Remember, the more we call the same potential client, the likely he or she will decline to open a term deposit.
4) Age Category: The next marketing campaign of the bank should target potential clients in their 20s or younger and 60s or older. The youngest category had a 60% chance of suscribing to a term deposit while the eldest category had a 76% chance of suscribing to a term deposit. It will be great if for the next campaign the bank addressed these two categories and therefore, increase the likelihood of more term deposits suscriptions.
5) Occupation: Not surprisingly, potential clients that were students or retired were the most likely to suscribe to a term deposit. Retired individuals, tend to have more term deposits in order to gain some cash through interest payments. Remember, term deposits are short-term loans in which the individual (in this case the retired person) agrees not to withdraw the cash from the bank until a certain date agreed between the individual and the financial institution. After that time the individual gets its capital back and its interest made on the loan. Retired individuals tend to not spend bigly its cash so they are morelikely to put their cash to work by lending it to the financial institution. Students were the other group that used to suscribe term deposits.
6) House Loans and Balances: Potential clients in the low balance and no balance category were more likely to have a house loan than people in the average and high balance category. What does it mean to have a house loan? This means that the potential client has financial compromises to pay back its house loan and thus, there is no cash for he or she to suscribe to a term deposit account. However, we see that potential clients in the average and hih balances are less likely to have a house loan and therefore, more likely to open a term deposit. Lastly, the next marketing campaign should focus on individuals of average and high balances in order to increase the likelihood of suscribing to a term deposit.
7) Develop a Questionaire during the Calls: Since duration of the call is the feature that most positively correlates with whether a potential client will open a term deposit or not, by providing an interesting questionaire for potential clients during the calls the conversation length might increase. Of course, this does not assure us that the potential client will suscribe to a term deposit! Nevertheless, we don't loose anything by implementing a strategy that will increase the level of engagement of the potential client leading to an increase probability of suscribing to a term deposit, and therefore an increase in effectiveness for the next marketing campaign the bank will excecute.
By combining all these strategies and simplifying the market audience the next campaign should address, it is likely that the next marketing campaign of the bank will be more effective than the current one.
1) Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurélien Géron.
2) Special Thanks to Ahmet Taspinar. (Insights on a handul of functions) Link: http://ataspinar.com/2017/05/26/classification-with-scikit-learn/
3) Special Thanks to Randy Lao and his Predicting Employee KernelOver work. Link: https://www.kaggle.com/randylaosat/predicting-employee-kernelover